try-election-with-applied#35283
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the Raft leader election mechanism by prioritizing candidates with higher applied progress. Key changes include extending the SyncRequestVote structure to carry the candidateAppliedIndex, implementing comparison logic during vote evaluation, and adding extensive test coverage across unit and cluster-level tests. Feedback highlights a critical concern regarding the breaking change to the RPC protocol, which may hinder rolling upgrades. Additionally, the reviewer warned that enforcing appliedIndex as a mandatory voting condition could risk election deadlocks and suggested treating it as a preference. There is also a note about reduced debuggability in the test library due to simplified JSON serialization.
| SyncTerm term; | ||
| SyncIndex lastLogIndex; | ||
| SyncTerm lastLogTerm; | ||
| SyncIndex candidateAppliedIndex; |
There was a problem hiding this comment.
Adding candidateAppliedIndex to the SyncRequestVote struct changes its memory layout and size. This is a breaking change for the RPC protocol and will prevent rolling upgrades in a distributed cluster. Consider using a versioned message format or appending new fields in a way that maintains backward compatibility.
| bool grant = (pMsg->term == currentTerm) && logOK && appliedOK && | ||
| ((!hasVoted) || syncUtilSameId(&ths->raftStore.voteFor, &pMsg->srcId)); |
There was a problem hiding this comment.
Making appliedOK a mandatory condition for granting a vote (&& appliedOK) poses a risk to cluster availability. If a node with the most up-to-date log is lagging in its FSM application (lower appliedIndex), it will be rejected by nodes that have applied more, even if those nodes have stale logs. This can lead to election deadlocks. To preserve Raft liveness, this should be a preference (e.g., via randomized delays) rather than a hard requirement for voting.
| snprintf(ptrbuf, sizeof(ptrbuf), "%p", pCache); | ||
| cJSON_AddStringToObject(pRoot, "opaqueCache", ptrbuf); |
There was a problem hiding this comment.
Pull request overview
This pull request implements an applied-index-aware Raft leader election preference in the sync subsystem by extending the SyncRequestVote contract to carry candidateAppliedIndex, using that value in vote-grant decisions, and adding unit/cluster-style regressions plus supporting debug/docs updates.
Changes:
- Extend
SyncRequestVotewithcandidateAppliedIndexand populate/log it during RequestVote send/receive. - Add vote-grant logic that rejects candidates whose applied index is behind the local node (with fallback when applied index is unavailable).
- Add/adjust sync unit tests (RequestVote serialization/decision, election preference, deterministic tie reset) and a pytest cluster regression; add design/plan docs.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/pytest/cluster/syncingTest.py | Adds timing + lagging-node regression scenario for replica-3 clusters |
| tests/pytest/cluster/clusterSetup.py | Adds node1/node2/node3 convenience fields derived from tdnodes |
| TD-xxx-try-election-with-applied | Adds a one-line instruction/placeholder text file |
| source/libs/sync/test/syncVotesGrantedTest.cpp | Updates syncNodeOpen call signature usage and adds deterministic tie-reset regression |
| source/libs/sync/test/syncRequestVoteTest.cpp | Adds applied-index round-trip + applied-index decision tests |
| source/libs/sync/test/syncElectTest.cpp | Adds a local in-process election harness and validates higher-applied node wins |
| source/libs/sync/test/sync_test_lib/src/syncSnapshotDebug.c | Updates snapshot debug JSON to new sender/receiver fields |
| source/libs/sync/test/sync_test_lib/src/syncRaftEntryDebug.c | Makes raft cache debug output opaque (pointer-only) |
| source/libs/sync/test/sync_test_lib/src/syncMessageDebug.c | Adds message-type compatibility macros + RequestVote builder with applied index + JSON field |
| source/libs/sync/test/sync_test_lib/src/syncMainDebug.c | Fixes build by casting for raftStoreGetTerm in const debug path |
| source/libs/sync/test/sync_test_lib/src/syncIO.c | Updates test IO helpers to new queue/Qall allocation APIs and version string usage |
| source/libs/sync/test/sync_test_lib/inc/syncTest.h | Declares new RequestVote helper; makes raft cache types opaque for tests |
| source/libs/sync/src/syncUtil.c | Extends RequestVote send/recv logging to include applied index |
| source/libs/sync/src/syncRequestVote.c | Adds applied-index gate into vote granting |
| source/libs/sync/src/syncMessage.c | Initializes candidateAppliedIndex on RequestVote build |
| source/libs/sync/src/syncMain.c | Adds syncNodeGetAppliedIndex() accessor via FSM callback |
| source/libs/sync/src/syncElection.c | Populates candidateAppliedIndex into outgoing RequestVote messages and uses unified logging |
| source/libs/sync/inc/syncUtil.h | Notes RequestVote logs now include applied-progress context |
| source/libs/sync/inc/syncRequestVote.h | Exposes syncNodeOnRequestVoteAppliedIndexOK() |
| source/libs/sync/inc/syncMessage.h | Adds candidateAppliedIndex to SyncRequestVote struct |
| source/libs/sync/inc/syncInt.h | Declares syncNodeGetAppliedIndex() |
| docs/001-raft-appliedindex-election/tasks.md | Adds task breakdown and execution order for the feature |
| docs/001-raft-appliedindex-election/spec.md | Adds feature spec and acceptance scenarios |
| docs/001-raft-appliedindex-election/research.md | Documents design decisions and non-goals |
| docs/001-raft-appliedindex-election/quickstart.md | Adds build/run/validation steps |
| docs/001-raft-appliedindex-election/plan.md | Adds implementation plan and repo structure notes |
| docs/001-raft-appliedindex-election/data-model.md | Defines message/data entities and validation rules |
| docs/001-raft-appliedindex-election/contracts/request-vote-applied-index.md | Specifies the RequestVote applied-index preference contract |
| docs/001-raft-appliedindex-election/checklists/requirements.md | Adds requirements completeness checklist for the spec |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -41,7 +74,15 @@ def run(self): | |||
|
|
|||
| tdSql.execute("alter table meters add col col5 int") | |||
| tdSql.execute("alter table meters drop col col5 int") | |||
| deadline = time.time() + 30 | ||
| replica_ready = False | ||
| while time.time() < deadline: | ||
| tdSql.query("show %s.vgroups" % db_name) | ||
| if tdSql.queryRows > 0: | ||
| replica_ready = True | ||
| break | ||
| time.sleep(1) |
| sNInfo(ths, | ||
| "appliedok:1, fallback due to unavailable applied index, {my-applied:%" PRId64 ", recv-applied:%" PRId64 | ||
| ", recv-term:%" PRIu64 "}", | ||
| *pLocalApplied, pMsg->candidateAppliedIndex, pMsg->term); | ||
| return true; |
| sNWarn(ths, | ||
| "appliedok:0, candidate applied index behind local state, {my-applied:%" PRId64 ", recv-applied:%" PRId64 | ||
| ", recv-term:%" PRIu64 "}", | ||
| *pLocalApplied, pMsg->candidateAppliedIndex, pMsg->term); | ||
| return false; |
| sNInfo(ths, | ||
| "appliedok:1, candidate applied index acceptable, {my-applied:%" PRId64 ", recv-applied:%" PRId64 | ||
| ", recv-term:%" PRIu64 "}", | ||
| *pLocalApplied, pMsg->candidateAppliedIndex, pMsg->term); | ||
| return true; |
| typedef struct SyncPing SyncPing; | ||
| typedef struct SyncPingReply SyncPingReply; | ||
| typedef void SRaftEntryHashCache; | ||
| typedef void SRaftEntryCache; |
| ", sby:%d, " | ||
| "r-num:%d, " | ||
| "lcfg:%" PRId64 ", chging:%d, rsto:%d", | ||
| pSyncNode->vgId, syncStr(pSyncNode->state), raftStoreGetTerm(pSyncNode), pSyncNode->commitIndex, | ||
| pSyncNode->vgId, syncStr(pSyncNode->state), raftStoreGetTerm((SSyncNode*)pSyncNode), pSyncNode->commitIndex, | ||
| logBeginIndex, logLastIndex, snapshot.lastApplyIndex, pSyncNode->raftCfg.isStandBy, pSyncNode->replicaNum, |
| Update the sync runtime in these areas: | ||
|
|
||
| - Add the applied-index field to `SyncRequestVote` in `community/source/libs/sync/inc/syncMessage.h`. | ||
| - Populate the field in `community/source/libs/sync/src/syncElection.c` when building vote requests. | ||
| - Read local applied progress from `FpAppliedIndexCb` in `community/source/libs/sync/src/syncRequestVote.c` and incorporate it after the existing log-recency gate. | ||
| - Extend request-vote logging in `community/source/libs/sync/src/syncUtil.c` or adjacent sync logging paths so operators can see the comparison inputs. |
| @@ -0,0 +1 @@ | |||
| 改进目前raft的选举实现,加入对appliedIndex的判断,appliedIndex大的更优先被选举为leader。不要替我创建git分支。 No newline at end of file | |||
| uint16_t gPorts[] = {7010, 7110, 7210, 7310, 7410}; | ||
| const char* gDir = "./syncElectTest"; | ||
| int32_t gVgId = 1234; | ||
| static SyncIndex electAppliedIndexCb(const SSyncFSM* pFsm) { return ((const ElectTestFsm*)pFsm)->appliedIndex; } |
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.